Skip to content

refactor: replace galactic-agent with galactic-router (controller-runtime) #120

Closed
privateip wants to merge 9 commits into
mainfrom
refactor/galactic-router-replace-agent
Closed

refactor: replace galactic-agent with galactic-router (controller-runtime) #120
privateip wants to merge 9 commits into
mainfrom
refactor/galactic-router-replace-agent

Conversation

@privateip

@privateip privateip commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR is the complete implementation of galactic-router which implements the BGP CRDs from Milo.

Final commit for the galactic-router rewrite branch. Adds comprehensive test coverage for controllers, reconcile logic, and the FRR stub, and updates the review plan to mark all phases complete.

This patch is a major refactor that replaces galactic-agent with galactic-router, a controller-runtime based BGP control-plane reconciler for Galactic.

What changed

The new architecture splits Galactic into two node-local binaries:

galactic-cni wires pods into VPC networks, creates VRF/veth/SRv6 state, and writes BGPAdvertisement CRDs.
galactic-router watches Cosmos BGP CRDs and drives an embedded GoBGP runtime to distribute EVPN paths.

Core behavior

galactic-router requires NODE_NAME and ROUTER_ROLE, supports tenant via GoBGP, and has a fabric role backed by an FRR stub for now. It runs controller-runtime, metrics on :8080, and a gRPC health server on :5000.

The reconcile layer translates BGPRouter, BGPPeer, BGPAdvertisement, and BGPPolicy resources into a DesiredRouter, filtering by target node and router role, validating AFI/SAFI and timers, resolving peer secrets, and using the node IPv6 InternalIP for EVPN next-hop

Verification

  • go build ./... — zero errors
  • go test ./internal/cni/, ./internal/reconcile/, ./internal/controller/, ./internal/hash/ — all pass
  • go vet ./... — zero warnings
  • task lint — 0 issues
  • go fmt ./... — no unformatted files

@privateip privateip requested a review from a team as a code owner June 21, 2026 13:45
@privateip privateip requested a review from ronggur June 21, 2026 13:45
@privateip privateip marked this pull request as draft June 21, 2026 13:47
@privateip privateip force-pushed the refactor/galactic-router-replace-agent branch from b87216d to a653dc1 Compare June 21, 2026 13:57
…time)

Replace the gRPC-based galactic-agent DaemonSet with a controller-runtime based galactic-router.

Key changes:

- Remove internal/agent, internal/bootstrap, internal/gobgp packages
- Add internal/controller with BGPRouter, BGPPeer, BGPAdvertisement, BGPPolicy, Secret, Node reconcilers
- Add internal/reconcile for CRD-to-DesiredRouter translation
- Add internal/runtime with RuntimeFactory pattern (GoBGP tenant, FRR fabric stub)
- Add internal/model for internal BGP types and internal/hash for change detection
- Update deployment manifests, Dockerfile, containerlab config, and docs
- Switch health probes to gRPC on port 5000; remove HTTP health and webhook ports
- GoBGP starts lazily on first BGPRouter reconcile (listenPort=-1, outbound-only)
- Hash-based no-op suppression prevents redundant GoBGP Apply calls
@privateip privateip force-pushed the refactor/galactic-router-replace-agent branch from a653dc1 to efc8cb5 Compare June 22, 2026 13:19
@privateip privateip marked this pull request as ready for review June 22, 2026 16:13
- Replace galactic-agent with galactic-router container image
- Replace BGPInstance/BGPPeer CRDs with BGPRouter/BGPPeer/BGPAdvertisement
- Replace infra cluster with dfw cluster (three-region: dfw, iad, sjc)
- Replace infra route reflector with iad-worker-rr node
- Remove cosmos operator deployment from containerlab
- Update NAD configs to use galacticRouter instead of gobgp
- Add BGP CRD patches to fix ASN maximum for kubebuilder v0.18.0
- Update all documentation, Taskfile, and scripts accordingly
@privateip privateip force-pushed the refactor/galactic-router-replace-agent branch from b92bc59 to e634f6c Compare June 22, 2026 16:24
privateip and others added 4 commits June 22, 2026 17:18
Replaces the ErrEVPNNotImplemented stub with a real implementation that
builds and advertises EVPN Type 5 IP Prefix routes for each SRv6
endpoint prefix in a BGPAdvertisement. The route distinguisher is
derived from the BGPRouter's routerID (Type 1 IP-address:0), the
MpReachNLRI next-hop is the node's primary IPv6 address, and route
target communities are parsed from the advertisement's communities
field. Withdrawal is also supported via b.DeletePath.

Also adds a configurable BGP_LISTEN_PORT environment variable to
galactic-router so the tenant GoBGP instance can bind on port 1790
(port 179 is occupied by the FRR underlay on each worker node).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
…ions

Update cosmos dependency to v0.0.0-20260622211233-0e38bdf25eac (PR #48)
which replaces individual FSM conditions (SessionIdle, SessionConnect, etc.)
with a consolidated Ready/Accepted condition model.

- Replace setPeerSessionState + setPeerCondition with setPeerReadyCondition
  that sets a single Ready condition using bgpv1alpha1.ConditionTypeReady
- Remove unused FSM condition constants (SessionIdle..SessionEstablished)
- Remove dead code: fsmConditions slice, fsmStateToCondition map
- Update review-plan.md: mark Phase 3.2 and 3.3 as DONE

Co-Authored-By: Claude <noreply@anthropic.com>
…nstants

- Add controller_test.go: indexes, enqueue helpers, node-to-router mapping,
  condition helpers (1015 lines)
- Add reconcile_test.go: BuildDesiredRouter, gatherPeers, gatherPolicies,
  AFI/timer/validation tests (1163 lines)
- Add frr_test.go: FRR stub tests (60 lines)
- Update review-plan.md: mark all phases DONE, add Phase 8 for new tests
- Add Reason* constants to status.go for BGP session state reasons
@privateip privateip changed the title refactor: replace galactic-agent with galactic-router (controller-runtime) chore: add controller/reconcile/frr tests, update review plan Jun 23, 2026
@privateip privateip changed the title chore: add controller/reconcile/frr tests, update review plan chore: add controller/reconcile/frr tests, update review plan + pin BGP source address via numbered underlay links Jun 23, 2026
@privateip privateip changed the title chore: add controller/reconcile/frr tests, update review plan + pin BGP source address via numbered underlay links chore: add controller/reconcile/frr tests, update review plan Jun 23, 2026
Replace BGP unnumbered (link-local) underlay peering with numbered
IPv6 /64 subnets between workers and transit routers. Add
BGP_LOCAL_ADDRESS env var to the galactic-router overlay DaemonSet
so GoBGP pins the TCP source address to the node SRv6 loopback.

Underlay: configure numbered IPv6 links and route-maps to set source
address on FRR BGP advertisements (SRv6 SID/forwarding prefixes).

GoBGP runtime: accept localAddress in NewRuntimeFactory, propagate
to peerFromDesired, set Transport.LocalAddress on every peer.

Docs: update containerlab README to reflect numbered links.
@privateip privateip force-pushed the refactor/galactic-router-replace-agent branch from 15fb427 to f96f5f2 Compare June 23, 2026 01:47
@privateip privateip changed the title chore: add controller/reconcile/frr tests, update review plan refactor: replace galactic-agent with galactic-router (controller-runtime) Jun 23, 2026
…write

- Refresh AGENTS.md with accurate env vars, task commands, and entry points
- Add architecture revision section with corrected repo layout, entry points,
  config tables, module reference, external deps, testing, CI/CD, and Claude
  guidance sections
- Add markdown table alignment convention
- Update CLAUDE.md to match
AGENTS.md is now a concise dev quick-reference:
- Pointer to ARCHITECTURE.md for full architecture details
- Keep task commands, tech stack, deployments, dev entry points

ARCHITECTURE.md is the comprehensive reference:
- Merge revision corrections into main body (remove separate revision section)
- Fix Repository Layout: remove nonexistent internal/metrics/, add internal/metadata/
- Fix Key Design Decisions: correct interface naming format
- Update Components table to match actual layout
- Keep Configuration, Module/Package Reference, External Dependencies,
  Testing, CI/CD, For Claude sections
@privateip privateip marked this pull request as draft June 23, 2026 19:48
@privateip

Copy link
Copy Markdown
Contributor Author

This PR has gotten unwieldy and impossible to review. I have broken it up into multiple PRs to ease the review burden. Closing this one

@privateip privateip closed this Jun 23, 2026
@privateip privateip deleted the refactor/galactic-router-replace-agent branch June 25, 2026 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant